Skip to content

perf: avoid unnecessary base64 conversion for aiocqhttp image/record#6850

Open
he-yufeng wants to merge 2 commits intoAstrBotDevs:masterfrom
he-yufeng:perf/aiocqhttp-image-passthrough
Open

perf: avoid unnecessary base64 conversion for aiocqhttp image/record#6850
he-yufeng wants to merge 2 commits intoAstrBotDevs:masterfrom
he-yufeng:perf/aiocqhttp-image-passthrough

Conversation

@he-yufeng
Copy link
Contributor

@he-yufeng he-yufeng commented Mar 23, 2026

问题

_from_segment_to_dict 处理 ImageRecord 时,无条件调用 convert_to_base64()。对于 http:// 来源的图片,这意味着先下载到本地再 base64 编码;对于 file:// 本地文件,也要全部读进内存编码。大图片场景下内存占用和 CPU 开销都很明显。

但实际上 NapCat 等 OneBot 协议端本身就支持 file://http(s)://base64:// 三种格式,完全没必要多做一次转换。

改动

新增 _resolve_file_uri() 方法,在发送前先检查 segment 的 file 字段格式:

输入格式 处理方式
file:///path/to/img 直接透传
http(s)://url 直接透传
base64://data 直接透传
本地绝对路径(裸路径) 转成 file:// URI 后透传
其他 回退到 base64 编码(兜底)

这样只有无法识别的格式才会走 base64,常见场景都避免了不必要的内存拷贝和编码。

测试

  • file:/// 路径 → 原样透传,不触发文件读取
  • http:// URL → 原样透传,不触发下载
  • base64:// 数据 → 原样透传
  • 裸绝对路径 → 转 file:// URI
  • 空值 / 无效格式 → 回退 base64(行为不变)

Closes #6717

Summary by Sourcery

Optimize handling of Image and Record segments in aiocqhttp events to avoid unnecessary base64 conversions when a directly usable file URI is available.

Enhancements:

  • Add logic to detect and directly pass through file://, http(s)://, and base64:// URIs for image and record segments instead of always converting to base64.
  • Support converting absolute local file paths to file:// URIs for protocol-side handling, falling back to base64 only for unsupported formats.

…sending

NapCat and other OneBot protocol endpoints natively support file://,
http(s)://, and base64:// URIs. Previously _from_segment_to_dict always
called convert_to_base64() which forces downloading HTTP images and
reading local files into memory just to encode them — wasting CPU,
memory, and bandwidth.

Now we check the source URI first and pass it through directly when
the protocol endpoint can handle it. Only truly unknown formats fall
back to base64 encoding.

Closes AstrBotDevs#6717
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 23, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在优化 aiocqhttp 平台处理 ImageRecord 消息段的性能和资源消耗。通过引入智能的文件URI解析逻辑,避免了在发送前对协议端已支持的 file://http(s)://base64:// 等格式进行冗余的Base64转换,从而显著减少了大文件场景下的内存占用和CPU开销,提升了消息处理效率。

Highlights

  • 新增文件URI解析方法: 引入了 _resolve_file_uri 静态方法,用于智能识别并处理 ImageRecord 消息段中的文件URI格式。
  • 优化Base64转换逻辑: 修改了 _from_segment_to_dict 方法,使其优先通过 _resolve_file_uri 解析文件URI,避免对 file://http(s)://base64:// 等协议端原生支持的格式进行不必要的Base64编码。
  • 支持裸路径转换: 对于本地绝对路径(裸路径),现在会自动转换为 file:// URI,以便协议端直接处理。
  • 兜底机制: 只有当文件URI格式无法识别时,才会回退到传统的Base64编码方式,确保兼容性。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. label Mar 23, 2026
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The _resolve_file_uri helper does a synchronous os.path.exists check on every absolute path, which adds blocking filesystem I/O to the hot path; consider trusting absolute paths without existence checks and letting the protocol/backend handle errors instead.
  • The expression raw = segment.url or segment.file if isinstance(segment, Image) else segment.file is a bit hard to read and relies on operator precedence; splitting this into an explicit if isinstance(segment, Image) branch will make the intended behavior for url vs file unambiguous.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_resolve_file_uri` helper does a synchronous `os.path.exists` check on every absolute path, which adds blocking filesystem I/O to the hot path; consider trusting absolute paths without existence checks and letting the protocol/backend handle errors instead.
- The expression `raw = segment.url or segment.file if isinstance(segment, Image) else segment.file` is a bit hard to read and relies on operator precedence; splitting this into an explicit `if isinstance(segment, Image)` branch will make the intended behavior for `url` vs `file` unambiguous.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这次的 Pull Request 带来了一个显著的性能优化,通过避免对 aiocqhttp 中的图片和语音片段进行不必要的 base64 编码。新的 _resolve_file_uri 方法能够正确识别可直接透传的 URI,并处理绝对文件路径,仅在必要时才回退到 base64 编码。实现清晰,直接解决了性能问题。我有一个关于 import 语句的小建议,以改善代码风格和一致性。

@@ -1,4 +1,5 @@
import asyncio
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

由于 pathlib 在新的 _resolve_file_uri 方法和文件其他地方都有使用,最好在这里将其作为顶级导入,以遵循 PEP 8 规范并提高代码清晰度。

Suggested change
import os
import os
import pathlib


# 裸路径,转成 file:// URI 让协议端自己读
if os.path.isabs(raw) and os.path.exists(raw):
import pathlib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在文件顶部导入 pathlib 后,此处的本地导入就不再需要了。

@FlanChanXwO
Copy link

本地路径发送逻辑可能存在问题

如果astrBot 与 napcat 不在同一个服务器,或者说它们的数据卷不是共享的, 我看到你的 _resolve_file_uri 函数对于文件是否存在的判断结果其实是针对于 AstrBot 应用本身环境 , 但是 napcat 在数据卷不共享情况下没法获知你的文件是不是真的存在。这意味着你的 _from_segment_to_dict 函数存在一个问题,在跨服务情况下基于本地文件路径发送方式永远不会走兜底策略。

使用 os 模块来进行文件读取可能存在并发问题

os 模块的操作是同步操作的,多个线程会按照顺序排队等待读取文件。这可能反而“反向中和”了原本被改善的本地文件发送优化

建议 1

你可以再去捕获下发送失败的异常,然后捕获到了再去走统一base64流程,但是我不建议捕获异常然后弄兜底, 因为这样效率不太好。

建议 2

我建议你弄个配置项,在 astrbot/core/config/default.py 给 aiocqhttp 适配器 弄个 base64统一转换开关 配置,配置默认是打开的,这样就可以兼容旧版本, 然后你根据配置项来决定发送方式,这样其实就不用去 raise 异常然后捕获异常然后弄兜底了。如果用户自己想提高下发送性能,看懂描述认为可以关掉就关。

"prefer_base64": {
                        "description": "优先使用 Base64 发送媒体",
                        "type": "bool",
                        "hint": "开启后,图片、语音媒体文件将使用 Base64 编码发送,确保跨服务器兼容。关闭后,优先使用本地文件路径或网络 URL 发送,失败时自动回退到 Base64。仅当 AstrBot 与协议端在同一台机器时可关闭。",
 },

建议 3

建议上 anyio 库去处理文件的读取,异步IO开销少点,性能也更好,而且不会出现锁的问题。我看 astrbot 也安了这个库,可以用

@he-yufeng
Copy link
Contributor Author

he-yufeng commented Mar 23, 2026

确实,跨服务器的场景没考虑到 — 本地 exists 过了 NapCat 那边根本拿不到文件。

建议 2 最合理,加个 prefer_base64 配置项,默认开着保持兼容。已经改好推上去了,顺便把文件读取换成了 aiofiles。

- Add `prefer_base64` config option (default: true) for aiocqhttp adapter.
  When enabled, always use base64 encoding for media to ensure cross-server
  compatibility. When disabled, try file:// and http:// passthrough first.
- Replace sync `os.path.exists` with `aiofiles.os.path.exists` to avoid
  blocking the event loop on concurrent requests.
- Fix operator precedence ambiguity in _resolve_file_uri by splitting the
  Image/Record branching into explicit if/else.
- Move `pathlib` import to module level (PEP 8).

Addresses review feedback from @FlanChanXwO and @sourcery-ai.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:platform The bug / feature is about IM platform adapter, such as QQ, Lark, Telegram, WebChat and so on. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] 优化 aiocqhttp 适配器图片发送机制,避免强制 base64 转换以提升性能和降低内存占用

2 participants